Foreground and Background Lexicons and Word Sense Disambiguation for Information Extraction

نویسنده

  • Adam Kilgarriff
چکیده

In recent years, lexicon acquisition from machine-readable dictionaries and corpora has been a dynamic field of research. However it has not always been evident how lexical information so acquired can be used, or how it relates to more structured meaning representations. In this paper I look at this issue in relation to one particular NLP task, Information Extraction (hereafter IE), and one subtask for which both lexical and general knowledge are required, Word Sense Disambiguation (WSD). The argument is as follows. For an IE task, the output formalism, that is, the database fields or templates which the system is to fill, specifies the objecttypes and relations that the system is to find out about; the ‘ontology’. An IE task operates in a specific domain. The task requires the key terms of that domain, the ‘foreground lexicon’, to be tightly bound to the ontology. This is a task that calls for human input. For all other vocabulary, the ‘background lexicon’, a far shallower semantics will be sufficient. This shallow semantics can be obtained automatically from sources such as machine-readable dictionaries and domain corpora. The foreground and background lexicons are suited to different kinds of WSD strategies. For the background lexicon, statistical methods for coarsegrained disambiguation are appropriate. For the foreground lexicon, WSD will occur as a by-product of finding a coherent semantic interpretation of an input sentence, in which all arguments are of the appropriate type. Once the foreground/background distinction is developed, there is a good match between what is possible, given the state of the art in WSD and acceptable levels of human input, and what is required, for high-quality IE. The two-tier approach has been adopted by a number of IE systems. The POETIC (Evans et al., 1996) and Sussex MUC-5 (Gaizauskas, Cahill, and Evans, 1994) systems used a hand-crafted foreground lexicon and the Alvey Tools lexicon (Carroll and Grover, 1989) as a background lexicon for syntactic

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-lingual WSD for Translation Extraction from Comparable Corpora

We propose a data-driven approach to enhance translation extraction from comparable corpora. Instead of resorting to an external dictionary, we translate source vector features by using a cross-lingual Word Sense Disambiguation method. The candidate senses for a feature correspond to sense clusters of its translations in a parallel corpus and the context used for disambiguation consists of the ...

متن کامل

Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation

We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method (Dušek et al., 2014), which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as...

متن کامل

Building Specialized Bilingual Lexicons Using Word Sense Disambiguation

This paper presents an extension of the standard approach used for bilingual lexicon extraction from comparable corpora. We study the ambiguity problem revealed by the seed bilingual dictionary used to translate context vectors and augment the standard approach by a Word Sense Disambiguation process. Our aim is to identify the translations of words that are more likely to give the best represen...

متن کامل

A Review Of Literature On Word Sense Disambiguation

lexical ambiguity is a fundamental characteristic of language. Words can have more than one distinct meaning. Word sense disambiguation is defined as the problem of computationally determining which”sense”of a word is correct in given context. Word sense disambiguation is a task of classification where word senses are the classes, the context provides the evidence, and each occurrence of a word...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9712007  شماره 

صفحات  -

تاریخ انتشار 1997